<<<<<<< HEAD ======= >>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916

Survival Analysis

Survival analysis

  • So far, have seen:

    • response variable counted or measured (regression)

    • response variable categorized (logistic regression)

  • But what if response is time until event (eg. time of survival after surgery)?

  • Additional complication: event might not have happened at end of study (eg. patient still alive). But knowing that patient has “not died yet” presumably informative. Such data called censored.

  • Enter survival analysis, in particular the “Cox proportional hazards model”.

  • Explanatory variables in this context often called covariates.

Packages

  • Install package survival if not done. Also use broom and marginaleffects from earlier.
<<<<<<< HEAD
library(tidyverse)
library(survival)
library(survminer)
library(broom)
library(marginaleffects)
=======
library(tidyverse)
library(survival)
library(broom)
library(marginaleffects)
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916

Example: still dancing?

  • 12 women who have just started taking dancing lessons are followed for up to a year, to see whether they are still taking dancing lessons, or have quit. The “event” here is “quit”.

  • This might depend on:

    • a treatment (visit to a dance competition)

    • woman’s age (at start of study).

Data

Months  Quit   Treatment Age
1        1        0      16
2        1        0      24
2        1        0      18
3        0        0      27
4        1        0      25
7        1        1      26
8        1        1      36
10       1        1      38
10       0        1      45
12       1        1      47

About the data

  • months and quit are kind of combined response:

    • Months is number of months a woman was actually observed dancing

    • quit is 1 if woman quit, 0 if still dancing at end of study.

  • Treatment is 1 if woman went to dance competition, 0 otherwise.

  • Fit model and see whether Age or Treatment have effect on survival.

  • Want to do predictions for probabilities of still dancing as they depend on whatever is significant, and draw plot.

Read data

  • Column-aligned:
<<<<<<< HEAD
url <- "http://ritsokiguess.site/datafiles/dancing.txt"
dance <- read_table(url)
=======
url <- "http://ritsokiguess.site/datafiles/dancing.txt"
dance <- read_table(url)
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916

The data

<<<<<<< HEAD
dance
=======
dance
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916

Fit model

  • Response variable has to incorporate both the survival time (Months) and whether or not the event, quitting, happened (that is, if Quit is 1).
  • This is made using Surv from survival package, with two inputs:
    • the column that has the survival times
    • something that is TRUE or 1 if the event happened.
  • Easiest for us to create this when we fit the model, predicting response from explanatories:
<<<<<<< HEAD
dance %>% mutate(mth = Surv(Months, Quit)) -> dance
dance
  • Then fit model, predicting mth from explanatories:
dance.1 <- coxph(mth ~ Treatment + Age, data = dance)
=======
dance.1 <- coxph(Surv(Months, Quit) ~ Treatment + Age, data = dance)
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916

Output looks a lot like regression

<<<<<<< HEAD
summary(dance.1)
=======
summary(dance.1)
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916
Call:
coxph(formula = Surv(Months, Quit) ~ Treatment + Age, data = dance)

  n= 12, number of events= 10 

              coef exp(coef) se(coef)      z Pr(>|z|)  
Treatment -4.44915   0.01169  2.60929 -1.705   0.0882 .
Age       -0.36619   0.69337  0.15381 -2.381   0.0173 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

          exp(coef) exp(-coef) lower .95 upper .95
Treatment   0.01169     85.554 7.026e-05    1.9444
Age         0.69337      1.442 5.129e-01    0.9373

Concordance= 0.964  (se = 0.039 )
Likelihood ratio test= 21.68  on 2 df,   p=2e-05
Wald test            = 5.67  on 2 df,   p=0.06
Score (logrank) test = 14.75  on 2 df,   p=6e-04

Conclusions

  • Use \(\alpha=0.10\) here since not much data.

  • Three tests at bottom like global F-test. Consensus that something predicts survival time (whether or not dancer quit and/or how long it took).

  • Age (definitely), Treatment (marginally) both predict survival time.

Behind the scenes

  • All depends on hazard rate, which is based on probability that event happens in the next short time period, given that event has not happened yet:

  • \(X\) denotes time to event, \(\delta\) is small time interval:

  • \(h(t) = P(X \le t + \delta | X \ge t) / \delta\)

  • if \(h(t)\) large, event likely to happen soon (lifetime short)

  • if \(h(t)\) small, event unlikely to happen soon (lifetime long).

Modelling lifetime

  • want to model hazard rate

  • but hazard rate always positive, so actually model log of hazard rate

  • modelling how (log-)hazard rate depends on other things eg \(X_1 =\) age, \(X_2 =\) treatment, with the \(\beta\) being regression coefficients:

  • Cox model \(h(t)=h_0(t)\exp(\beta_0+\beta_1X_1+\beta_2 X_2+\cdots)\), or:

  • \(\log(h(t)) = \log(h_0(t)) + \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots\)

  • like a generalized linear model with log link.

Predictions with marginaleffects

  • marginaleffects knows about survival models (with sufficient care)
  • Predicted survival probabilities depend on:
    • the combination of explanatory variables you are looking at
    • the time at which you are looking at them (when more time has passed, it is more likely that the event has happened, so the “survival probability” should be lower).
  • look at effect of age by comparing ages 20 and 40, and later look at the effect of treatment (values 1 and 0).
  • Also have to provide some times to predict for, in Months.

Effect of age

new <- datagrid(model = dance.1, Age = c(20, 40), Months = c(3, 5, 7))
new

These are actually for women who did not go to the dance competition.

The predictions

cbind(predictions(dance.1, newdata = new, type = "survival")) %>% 
  select(Age, Treatment, Months, estimate)

The estimated survival probabilities go down over time. For example a 20-year-old woman here has estimated probability 0.0293 of still dancing after 5 months.

A graph

We can plot the predictions over time for an experimental condition such as age. The key for plot_predictions is to put time first in the condition:

plot_predictions(dance.1, condition = c("Months", "Age"), 
                 type = "survival")

Comments

  • The plot picks some representative ages.
  • It is (usually) best to be up and to the right (has the highest chance of surviving longest).
  • Hence the oldest women have the best chance to still be dancing longest (the youngest women are most likely to quit soonest).

The effect of treatment

The same procedure will get predictions for women who did or did not go to the dance competition, at various times:

new <- datagrid(model = dance.1, Treatment = c(0, 1), Months = c(3, 5, 7))
new

The age used for predictions is the mean of all ages.

The predictions

cbind(predictions(dance.1, newdata = new, type = "survival")) %>% 
  select(Age, Treatment, Months, estimate)

Women of this age have a high (0.879) chance of still dancing after 7 months if they went to the dance competition, but much lower (0.165) if they did not.

A graph

In condition, put the time variable first, and then the effect of interest:

plot_predictions(dance.1, condition = c("Months", "Treatment"), 
                 type = "survival")

Comments

  • The survival curve for Treatment 1 is higher all the way along
  • Hence at any time, the women who went to the dance competition have a higher chance of still dancing than those who did not.

The model summary again

summary(dance.1)
Call:
coxph(formula = Surv(Months, Quit) ~ Treatment + Age, data = dance)

  n= 12, number of events= 10 

              coef exp(coef) se(coef)      z Pr(>|z|)  
Treatment -4.44915   0.01169  2.60929 -1.705   0.0882 .
Age       -0.36619   0.69337  0.15381 -2.381   0.0173 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

          exp(coef) exp(-coef) lower .95 upper .95
Treatment   0.01169     85.554 7.026e-05    1.9444
Age         0.69337      1.442 5.129e-01    0.9373

Concordance= 0.964  (se = 0.039 )
Likelihood ratio test= 21.68  on 2 df,   p=2e-05
Wald test            = 5.67  on 2 df,   p=0.06
Score (logrank) test = 14.75  on 2 df,   p=6e-04

Comments

  • The numbers in the coef column describe effect of that variable on log-hazard of quitting.
  • Both numbers are negative, so a higher value on both variables goes with a lower hazard of quitting:
    • an older woman is less likely to quit soon (more likely to be still dancing)
    • a woman who went to the dance competition (Treatment = 1) is less likely to quit soon vs. a woman who didn’t (more likely to be still dancing).

Model checking

  • With regression, usually plot residuals against fitted values.

  • Not quite same here (nonlinear model), but “martingale residuals” should have no pattern vs. “linear predictor”.

  • Use broom ideas to get them, in .resid and .fitted as below.

  • Martingale residuals can go very negative, so won’t always look normal.

Martingale residuals

<<<<<<< HEAD
ggcoxdiagnostics(dance.1) 

Predicted survival probs

  • The function we use is called survfit, though actually works rather like predict.
  • First create a data frame of values to predict from. We’ll do all combos of ages 20 and 40, treatment and not, using datagrid to get all the combos:
treatments <- c(0, 1)
ages <- c(20, 40)
dance.new <- datagrid(model = dance.1, Treatment = treatments, Age = ages)
dance.new

The predictions

One prediction for each time for each combo of age and treatment in dance.new:

s <- survfit(dance.1, newdata = dance.new, data = dance)
summary(s)
Call: survfit(formula = dance.1, newdata = dance.new, data = dance)

 time n.risk n.event survival1 survival2 survival3 survival4
    1     12       1  8.76e-01  1.00e+00  9.98e-01     1.000
    2     11       2  3.99e-01  9.99e-01  9.89e-01     1.000
    4      8       1  1.24e-01  9.99e-01  9.76e-01     1.000
    5      7       1  2.93e-02  9.98e-01  9.60e-01     1.000
    7      6       1 2.96e-323  6.13e-01  1.70e-04     0.994
    8      5       1  0.00e+00  2.99e-06  1.35e-98     0.862
   10      4       1  0.00e+00  0.00e+00  0.00e+00     0.000
   11      2       1  0.00e+00  0.00e+00  0.00e+00     0.000
   12      1       1  0.00e+00  0.00e+00  0.00e+00     0.000

Conclusions from predicted probs

  • Older women more likely to be still dancing than younger women (compare “profiles” for same treatment group).

  • Effect of treatment seems to be to increase prob of still dancing (compare “profiles” for same age for treatment group vs. not)

  • Would be nice to see this on a graph. This is ggsurvplot from package survminer:

g <- ggsurvplot(s, conf.int = F)

“Strata” (groups)

  • uses “strata” thus (dance.new):

Plotting survival probabilities

g

Discussion

  • Survivor curve farther to the right is better (better chance of surviving longer).

  • Best is age 40 with treatment, worst age 20 without.

  • Appears to be:

    • age effect (40 better than 20)

    • treatment effect (treatment better than not)

    • In analysis, treatment effect only marginally significant.

=======
dance.1 %>% augment(dance) %>% 
  ggplot(aes(x = .fitted, y = .resid)) + geom_point() + geom_smooth()
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916

A more realistic example: lung cancer

The variables

Uh oh, missing values

<<<<<<< HEAD
lung
=======
lung
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916

A closer look

<<<<<<< HEAD
summary(lung)
=======
summary(lung)
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916
      inst            time            status           age       
 Min.   : 1.00   Min.   :   5.0   Min.   :1.000   Min.   :39.00  
 1st Qu.: 3.00   1st Qu.: 166.8   1st Qu.:1.000   1st Qu.:56.00  
 Median :11.00   Median : 255.5   Median :2.000   Median :63.00  
 Mean   :11.09   Mean   : 305.2   Mean   :1.724   Mean   :62.45  
 3rd Qu.:16.00   3rd Qu.: 396.5   3rd Qu.:2.000   3rd Qu.:69.00  
 Max.   :33.00   Max.   :1022.0   Max.   :2.000   Max.   :82.00  
 NA's   :1                                                       
      sex           ph.ecog          ph.karno        pat.karno     
 Min.   :1.000   Min.   :0.0000   Min.   : 50.00   Min.   : 30.00  
 1st Qu.:1.000   1st Qu.:0.0000   1st Qu.: 75.00   1st Qu.: 70.00  
 Median :1.000   Median :1.0000   Median : 80.00   Median : 80.00  
 Mean   :1.395   Mean   :0.9515   Mean   : 81.94   Mean   : 79.96  
 3rd Qu.:2.000   3rd Qu.:1.0000   3rd Qu.: 90.00   3rd Qu.: 90.00  
 Max.   :2.000   Max.   :3.0000   Max.   :100.00   Max.   :100.00  
                 NA's   :1        NA's   :1        NA's   :3       
    meal.cal         wt.loss       
 Min.   :  96.0   Min.   :-24.000  
 1st Qu.: 635.0   1st Qu.:  0.000  
 Median : 975.0   Median :  7.000  
 Mean   : 928.8   Mean   :  9.832  
 3rd Qu.:1150.0   3rd Qu.: 15.750  
 Max.   :2600.0   Max.   : 68.000  
 NA's   :47       NA's   :14       

Remove obs with any missing values

<<<<<<< HEAD
lung %>% drop_na() -> lung.complete
lung.complete %>%
  select(meal.cal:wt.loss) %>%
  slice(1:10)
=======
lung %>% drop_na() -> lung.complete
lung.complete %>%
  select(meal.cal:wt.loss) %>%
  slice(1:10)
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916

Missing values seem to be gone.

Check!

<<<<<<< HEAD
summary(lung.complete)
=======
summary(lung.complete)
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916
      inst            time            status           age       
 Min.   : 1.00   Min.   :   5.0   Min.   :1.000   Min.   :39.00  
 1st Qu.: 3.00   1st Qu.: 174.5   1st Qu.:1.000   1st Qu.:57.00  
 Median :11.00   Median : 268.0   Median :2.000   Median :64.00  
 Mean   :10.71   Mean   : 309.9   Mean   :1.719   Mean   :62.57  
 3rd Qu.:15.00   3rd Qu.: 419.5   3rd Qu.:2.000   3rd Qu.:70.00  
 Max.   :32.00   Max.   :1022.0   Max.   :2.000   Max.   :82.00  
      sex           ph.ecog          ph.karno        pat.karno     
 Min.   :1.000   Min.   :0.0000   Min.   : 50.00   Min.   : 30.00  
 1st Qu.:1.000   1st Qu.:0.0000   1st Qu.: 70.00   1st Qu.: 70.00  
 Median :1.000   Median :1.0000   Median : 80.00   Median : 80.00  
 Mean   :1.383   Mean   :0.9581   Mean   : 82.04   Mean   : 79.58  
 3rd Qu.:2.000   3rd Qu.:1.0000   3rd Qu.: 90.00   3rd Qu.: 90.00  
 Max.   :2.000   Max.   :3.0000   Max.   :100.00   Max.   :100.00  
    meal.cal         wt.loss       
 Min.   :  96.0   Min.   :-24.000  
 1st Qu.: 619.0   1st Qu.:  0.000  
 Median : 975.0   Median :  7.000  
 Mean   : 929.1   Mean   :  9.719  
 3rd Qu.:1162.5   3rd Qu.: 15.000  
 Max.   :2600.0   Max.   : 68.000  

No missing values left.

Model 1: use everything except inst

<<<<<<< HEAD
names(lung.complete)
=======
names(lung.complete)
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916
 [1] "inst"      "time"      "status"    "age"       "sex"      
 [6] "ph.ecog"   "ph.karno"  "pat.karno" "meal.cal"  "wt.loss"  
<<<<<<< HEAD
lung.complete %>% 
   mutate(resp = Surv(time, status == 2)) ->
   lung.complete
lung.1 <- coxph(resp ~ . - inst - time - status,
  data = lung.complete
)
=======
lung.1 <- coxph(Surv(time, status == 2) ~ . - inst - time - status,
  data = lung.complete
)
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916

“Dot” means “all the other variables”.

summary of model 1

<<<<<<< HEAD
summary(lung.1)
=======
summary(lung.1)
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916
Call:
coxph(formula = Surv(time, status == 2) ~ . - inst - time - status, 
    data = lung.complete)

  n= 167, number of events= 120 

                coef  exp(coef)   se(coef)      z Pr(>|z|)   
age        1.080e-02  1.011e+00  1.160e-02  0.931  0.35168   
sex       -5.536e-01  5.749e-01  2.016e-01 -2.746  0.00603 **
ph.ecog    7.395e-01  2.095e+00  2.250e-01  3.287  0.00101 **
ph.karno   2.244e-02  1.023e+00  1.123e-02  1.998  0.04575 * 
pat.karno -1.207e-02  9.880e-01  8.116e-03 -1.488  0.13685   
meal.cal   2.835e-05  1.000e+00  2.594e-04  0.109  0.91298   
wt.loss   -1.420e-02  9.859e-01  7.766e-03 -1.828  0.06748 . 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

          exp(coef) exp(-coef) lower .95 upper .95
age          1.0109     0.9893    0.9881    1.0341
sex          0.5749     1.7395    0.3872    0.8534
ph.ecog      2.0950     0.4773    1.3479    3.2560
ph.karno     1.0227     0.9778    1.0004    1.0455
pat.karno    0.9880     1.0121    0.9724    1.0038
meal.cal     1.0000     1.0000    0.9995    1.0005
wt.loss      0.9859     1.0143    0.9710    1.0010

Concordance= 0.653  (se = 0.029 )
Likelihood ratio test= 28.16  on 7 df,   p=2e-04
Wald test            = 27.5  on 7 df,   p=3e-04
Score (logrank) test = 28.31  on 7 df,   p=2e-04

Overall significance

The three tests of overall significance:

<<<<<<< HEAD
glance(lung.1) %>% select(starts_with("p.value"))
=======
glance(lung.1) %>% select(starts_with("p.value"))
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916

All strongly significant. Something predicts survival.

Coefficients for model 1

<<<<<<< HEAD
tidy(lung.1) %>% select(term, p.value) %>% arrange(p.value)
=======
tidy(lung.1) %>% select(term, p.value) %>% arrange(p.value)
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916
  • sex and ph.ecog definitely significant here

  • age, pat.karno and meal.cal definitely not

  • Take out definitely non-sig variables, and try again.

Model 2

<<<<<<< HEAD
lung.2 <- update(lung.1, . ~ . - age - pat.karno - meal.cal)
tidy(lung.2) %>% select(term, p.value)
=======
lung.2 <- update(lung.1, . ~ . - age - pat.karno - meal.cal)
tidy(lung.2) %>% select(term, p.value)
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916

Compare with first model:

<<<<<<< HEAD
anova(lung.2, lung.1)
=======
anova(lung.2, lung.1)
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916
  • No harm in taking out those variables.

Model 3

Take out ph.karno and wt.loss as well.

<<<<<<< HEAD
lung.3 <- update(lung.2, . ~ . - ph.karno - wt.loss)
tidy(lung.3) %>% select(term, estimate, p.value)
=======
lung.3 <- update(lung.2, . ~ . - ph.karno - wt.loss)
tidy(lung.3) %>% select(term, estimate, p.value)
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916
<<<<<<< HEAD
summary(lung.3)
=======
summary(lung.3)
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916
Call:
coxph(formula = Surv(time, status == 2) ~ sex + ph.ecog, data = lung.complete)

  n= 167, number of events= 120 

           coef exp(coef) se(coef)      z Pr(>|z|)    
sex     -0.5101    0.6004   0.1969 -2.591 0.009579 ** 
ph.ecog  0.4825    1.6201   0.1323  3.647 0.000266 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

        exp(coef) exp(-coef) lower .95 upper .95
sex        0.6004     1.6655    0.4082    0.8832
ph.ecog    1.6201     0.6172    1.2501    2.0998

Concordance= 0.641  (se = 0.031 )
Likelihood ratio test= 19.48  on 2 df,   p=6e-05
Wald test            = 19.35  on 2 df,   p=6e-05
Score (logrank) test = 19.62  on 2 df,   p=5e-05

Check whether that was OK

<<<<<<< HEAD
anova(lung.3, lung.2)
=======
anova(lung.3, lung.2)
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916

Just OK.

Commentary

Plotting survival probabilities

<<<<<<< HEAD
sexes <- c(1, 2)
ph.ecogs <- 0:3
lung.new <- datagrid(sex = sexes, ph.ecog = ph.ecogs, model = lung.3)
lung.new
======= >>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916

Effect of sex:

<<<<<<< HEAD
s <- survfit(lung.3, newdata = lung.new, data = lung)
summary(s)
Call: survfit(formula = lung.3, newdata = lung.new, data = lung)

 time n.risk n.event survival1 survival2 survival3 survival4
    5    167       1     0.996    0.9932   0.98908  0.982370
   11    166       1     0.992    0.9865   0.97825  0.965007
   12    165       1     0.987    0.9798   0.96743  0.947763
   13    164       1     0.983    0.9730   0.95660  0.930639
   15    163       1     0.979    0.9662   0.94577  0.913634
   26    162       1     0.975    0.9594   0.93502  0.896868
   30    161       1     0.970    0.9526   0.92427  0.880221
   31    160       1     0.966    0.9457   0.91352  0.863694
   53    159       2     0.957    0.9320   0.89222  0.831293
   54    157       1     0.953    0.9252   0.88162  0.815359
   59    156       1     0.949    0.9183   0.87103  0.799544
   60    155       2     0.940    0.9046   0.85000  0.768509
   61    153       1     0.936    0.8977   0.83959  0.753310
   62    152       1     0.931    0.8908   0.82922  0.738301
   65    151       1     0.927    0.8840   0.81894  0.723524
   79    150       1     0.922    0.8771   0.80865  0.708862
   81    149       1     0.918    0.8703   0.79845  0.694431
   88    148       1     0.913    0.8635   0.78835  0.680252
   92    147       1     0.909    0.8567   0.77830  0.666254
   93    146       1     0.904    0.8499   0.76829  0.652435
   95    145       2     0.895    0.8362   0.74834  0.625202
  107    142       1     0.891    0.8293   0.73836  0.611749
  110    141       1     0.886    0.8224   0.72843  0.598481
  118    140       1     0.882    0.8155   0.71856  0.585392
  135    139       1     0.877    0.8085   0.70860  0.572303
  142    138       1     0.872    0.8015   0.69869  0.559396
  145    137       1     0.868    0.7945   0.68884  0.546672
  147    136       1     0.863    0.7875   0.67907  0.534172
  153    135       1     0.858    0.7806   0.66940  0.521891
  156    134       2     0.849    0.7667   0.65018  0.497831
  163    132       3     0.834    0.7457   0.62158  0.462841
  166    129       1     0.829    0.7386   0.61212  0.451483
  167    128       1     0.825    0.7316   0.60266  0.440233
  170    127       1     0.820    0.7245   0.59329  0.429201
  176    124       1     0.815    0.7174   0.58386  0.418196
  179    122       1     0.810    0.7102   0.57446  0.407344
  180    121       1     0.805    0.7031   0.56512  0.396669
  181    120       2     0.794    0.6887   0.54650  0.375711
  183    118       1     0.789    0.6815   0.53728  0.365495
  197    114       1     0.784    0.6741   0.52779  0.355101
  199    112       1     0.778    0.6665   0.51827  0.344779
  201    111       1     0.773    0.6590   0.50882  0.334648
  207    108       1     0.768    0.6514   0.49933  0.324595
  210    107       1     0.762    0.6438   0.48991  0.314732
  212    105       1     0.756    0.6361   0.48045  0.304945
  218    104       1     0.751    0.6283   0.47099  0.295276
  222    102       1     0.745    0.6205   0.46148  0.285683
  223    100       1     0.739    0.6124   0.45186  0.276090
  226     97       1     0.733    0.6042   0.44210  0.266498
  229     96       1     0.727    0.5961   0.43248  0.257165
  230     95       1     0.720    0.5879   0.42294  0.248038
  239     93       1     0.714    0.5797   0.41343  0.239068
  245     90       1     0.708    0.5714   0.40388  0.230184
  246     89       1     0.702    0.5632   0.39447  0.221557
  267     85       1     0.695    0.5548   0.38501  0.213013
  268     84       1     0.689    0.5465   0.37569  0.204723
  269     83       1     0.682    0.5382   0.36652  0.196684
  270     81       1     0.676    0.5299   0.35738  0.188799
  283     79       1     0.669    0.5215   0.34827  0.181068
  284     78       1     0.662    0.5131   0.33926  0.173537
  285     76       2     0.649    0.4962   0.32136  0.158944
  286     74       1     0.642    0.4878   0.31259  0.151979
  288     73       1     0.635    0.4795   0.30397  0.145249
  291     72       1     0.628    0.4711   0.29536  0.138637
  293     69       1     0.621    0.4622   0.28642  0.131905
  301     66       1     0.614    0.4532   0.27745  0.125276
  303     64       1     0.606    0.4441   0.26841  0.118730
  305     62       1     0.598    0.4348   0.25937  0.112320
  310     61       1     0.590    0.4256   0.25052  0.106180
  320     60       1     0.582    0.4163   0.24180  0.100256
  337     58       1     0.574    0.4071   0.23320  0.094542
  345     57       1     0.566    0.3980   0.22479  0.089081
  348     56       1     0.558    0.3890   0.21657  0.083866
  351     55       1     0.550    0.3801   0.20859  0.078916
  353     54       2     0.534    0.3623   0.19306  0.069614
  361     52       1     0.526    0.3536   0.18556  0.065290
  363     51       2     0.510    0.3362   0.17097  0.057176
  371     49       1     0.502    0.3275   0.16390  0.053395
  390     45       1     0.494    0.3186   0.15677  0.049681
  426     42       1     0.485    0.3092   0.14934  0.045926
  428     41       1     0.476    0.2999   0.14215  0.042394
  429     40       1     0.467    0.2908   0.13517  0.039074
  433     39       1     0.457    0.2816   0.12833  0.035920
  444     38       1     0.448    0.2726   0.12175  0.032987
  450     37       1     0.439    0.2636   0.11532  0.030210
  455     36       1     0.430    0.2548   0.10910  0.027616
  457     35       1     0.421    0.2460   0.10310  0.025196
  460     33       1     0.411    0.2369   0.09701  0.022831
  473     32       1     0.401    0.2279   0.09107  0.020608
  477     31       1     0.392    0.2190   0.08536  0.018556
  519     29       1     0.381    0.2097   0.07957  0.016559
  520     28       1     0.371    0.2004   0.07393  0.014701
  524     27       1     0.360    0.1912   0.06855  0.013008
  550     25       1     0.349    0.1815   0.06302  0.011348
  558     23       1     0.337    0.1716   0.05749  0.009781
  567     21       1     0.325    0.1616   0.05217  0.008356
  574     20       1     0.312    0.1516   0.04704  0.007067
  583     19       1     0.300    0.1418   0.04224  0.005936
  613     18       1     0.287    0.1321   0.03764  0.004925
  641     17       1     0.273    0.1223   0.03325  0.004028
  643     16       1     0.260    0.1129   0.02919  0.003262
  655     15       1     0.247    0.1038   0.02546  0.002613
  687     14       1     0.234    0.0949   0.02203  0.002068
  689     13       1     0.221    0.0864   0.01892  0.001615
  705     12       1     0.207    0.0778   0.01598  0.001229
  707     11       1     0.193    0.0699   0.01341  0.000926
  731     10       1     0.178    0.0613   0.01085  0.000656
  765      8       1     0.162    0.0525   0.00843  0.000436
  791      7       1     0.146    0.0442   0.00638  0.000278
  814      5       1     0.126    0.0348   0.00434  0.000149
 survival5 survival6 survival7 survival8
     0.997     0.996    0.9934   0.98938
     0.995     0.992    0.9869   0.97884
     0.992     0.988    0.9803   0.96830
     0.990     0.984    0.9737   0.95776
     0.987     0.980    0.9671   0.94721
     0.985     0.975    0.9605   0.93673
     0.982     0.971    0.9538   0.92626
     0.980     0.967    0.9471   0.91577
     0.974     0.959    0.9338   0.89499
     0.972     0.954    0.9271   0.88465
     0.969     0.950    0.9204   0.87431
     0.964     0.942    0.9070   0.85377
     0.961     0.937    0.9003   0.84359
     0.958     0.933    0.8936   0.83346
     0.955     0.929    0.8870   0.82340
     0.953     0.924    0.8803   0.81334
     0.950     0.920    0.8736   0.80336
     0.947     0.916    0.8669   0.79347
     0.944     0.911    0.8603   0.78362
     0.941     0.907    0.8536   0.77382
     0.936     0.898    0.8402   0.75426
     0.933     0.894    0.8335   0.74448
     0.930     0.889    0.8267   0.73474
     0.927     0.885    0.8200   0.72505
     0.924     0.880    0.8132   0.71527
     0.921     0.876    0.8063   0.70554
     0.918     0.871    0.7995   0.69586
     0.915     0.866    0.7926   0.68626
     0.912     0.862    0.7858   0.67674
     0.906     0.853    0.7722   0.65784
     0.897     0.838    0.7516   0.62967
     0.894     0.834    0.7447   0.62035
     0.891     0.829    0.7378   0.61102
     0.887     0.824    0.7309   0.60178
     0.884     0.819    0.7239   0.59247
     0.881     0.814    0.7169   0.58319
     0.878     0.809    0.7099   0.57396
     0.871     0.799    0.6957   0.55556
     0.868     0.794    0.6887   0.54644
     0.864     0.789    0.6813   0.53705
     0.860     0.784    0.6739   0.52762
     0.857     0.778    0.6665   0.51826
     0.853     0.773    0.6590   0.50885
     0.849     0.768    0.6515   0.49951
     0.846     0.762    0.6439   0.49013
     0.842     0.757    0.6363   0.48074
     0.838     0.751    0.6286   0.47130
     0.834     0.745    0.6207   0.46173
     0.830     0.739    0.6126   0.45203
     0.826     0.733    0.6045   0.44246
     0.821     0.727    0.5965   0.43296
     0.817     0.721    0.5884   0.42349
     0.813     0.715    0.5802   0.41397
     0.808     0.708    0.5720   0.40458
     0.804     0.702    0.5638   0.39514
     0.799     0.696    0.5555   0.38583
     0.795     0.689    0.5474   0.37666
     0.790     0.683    0.5391   0.36752
     0.786     0.676    0.5308   0.35841
     0.781     0.670    0.5225   0.34938
     0.771     0.657    0.5058   0.33143
     0.766     0.650    0.4975   0.32264
     0.762     0.643    0.4892   0.31398
     0.757     0.636    0.4808   0.30532
     0.751     0.629    0.4720   0.29633
     0.746     0.622    0.4631   0.28729
     0.740     0.614    0.4540   0.27818
     0.734     0.606    0.4447   0.26907
     0.729     0.599    0.4356   0.26014
     0.723     0.591    0.4264   0.25132
     0.717     0.583    0.4172   0.24262
     0.711     0.575    0.4081   0.23411
     0.705     0.567    0.3991   0.22578
     0.699     0.559    0.3902   0.21768
     0.686     0.544    0.3725   0.20189
     0.680     0.536    0.3637   0.19426
     0.668     0.520    0.3463   0.17939
     0.661     0.512    0.3376   0.17217
     0.655     0.503    0.3287   0.16487
     0.647     0.494    0.3193   0.15727
     0.640     0.485    0.3099   0.14989
     0.633     0.476    0.3007   0.14273
     0.625     0.467    0.2915   0.13570
     0.618     0.458    0.2824   0.12893
     0.610     0.449    0.2734   0.12230
     0.602     0.440    0.2644   0.11588
     0.595     0.431    0.2556   0.10967
     0.586     0.421    0.2464   0.10337
     0.578     0.411    0.2372   0.09720
     0.570     0.402    0.2282   0.09127
     0.560     0.391    0.2188   0.08524
     0.551     0.381    0.2093   0.07936
     0.542     0.370    0.2000   0.07374
     0.531     0.359    0.1902   0.06794
     0.520     0.347    0.1800   0.06214
     0.509     0.335    0.1698   0.05653
     0.497     0.322    0.1596   0.05112
     0.485     0.310    0.1496   0.04604
     0.472     0.297    0.1396   0.04116
     0.459     0.283    0.1295   0.03647
     0.446     0.270    0.1198   0.03214
     0.432     0.257    0.1104   0.02813
     0.418     0.243    0.1012   0.02444
     0.403     0.230    0.0923   0.02107
     0.388     0.216    0.0834   0.01789
     0.373     0.202    0.0751   0.01508
     0.355     0.187    0.0661   0.01227
     0.335     0.170    0.0568   0.00960
     0.315     0.154    0.0481   0.00732
     0.288     0.133    0.0382   0.00504
g <- ggsurvplot(s, conf.int = F)

The plot

g

Discussion of survival curves

=======
plot_predictions(lung.3, condition = c("time", "sex"), type = "survival")

Effect of ph.ecog score:

plot_predictions(lung.3, condition = c("time", "ph.ecog"), type = "survival")

Comments

>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916

Sex and ph.ecog score

<<<<<<< HEAD
tidy(lung.3) %>% select(term, estimate, p.value)
summary(lung.3)
=======
plot_predictions(lung.3, condition = c("time", "ph.ecog", "sex"), type = "survival")

Comments

The summary again

summary(lung.3)
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916
Call:
coxph(formula = Surv(time, status == 2) ~ sex + ph.ecog, data = lung.complete)

  n= 167, number of events= 120 

           coef exp(coef) se(coef)      z Pr(>|z|)    
sex     -0.5101    0.6004   0.1969 -2.591 0.009579 ** 
ph.ecog  0.4825    1.6201   0.1323  3.647 0.000266 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

        exp(coef) exp(-coef) lower .95 upper .95
sex        0.6004     1.6655    0.4082    0.8832
ph.ecog    1.6201     0.6172    1.2501    2.0998

Concordance= 0.641  (se = 0.031 )
Likelihood ratio test= 19.48  on 2 df,   p=6e-05
Wald test            = 19.35  on 2 df,   p=6e-05
Score (logrank) test = 19.62  on 2 df,   p=5e-05

Comments

Martingale residuals for this model

No problems here:

<<<<<<< HEAD
ggcoxdiagnostics(lung.3) + geom_smooth(se = F)

When the Cox model fails

=======
lung.3 %>% augment(lung.complete) %>% 
  ggplot(aes(x = .fitted, y = .resid)) + geom_point() + geom_smooth()

When the Cox model fails (optional)

>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916
<<<<<<< HEAD
age <- seq(20, 60, 5)
survtime <- c(10, 12, 11, 21, 15, 20, 8, 9, 11)
stat <- c(1, 1, 1, 1, 0, 1, 1, 1, 1)
d <- tibble(age, survtime, stat)
d %>% mutate(y = Surv(survtime, stat)) -> d
=======
age <- seq(20, 60, 5)
survtime <- c(10, 12, 11, 21, 15, 20, 8, 9, 11)
stat <- c(1, 1, 1, 1, 0, 1, 1, 1, 1)
d <- tibble(age, survtime, stat)
d %>% mutate(y = Surv(survtime, stat)) -> d
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916

Fit Cox model

<<<<<<< HEAD
y.1 <- coxph(y ~ age, data = d)
summary(y.1)
=======
y.1 <- coxph(y ~ age, data = d)
summary(y.1)
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916
Call:
coxph(formula = y ~ age, data = d)

  n= 9, number of events= 8 

       coef exp(coef) se(coef)     z Pr(>|z|)
age 0.01984   1.02003  0.03446 0.576    0.565

    exp(coef) exp(-coef) lower .95 upper .95
age      1.02     0.9804    0.9534     1.091

Concordance= 0.545  (se = 0.105 )
Likelihood ratio test= 0.33  on 1 df,   p=0.6
Wald test            = 0.33  on 1 df,   p=0.6
Score (logrank) test = 0.33  on 1 df,   p=0.6

Martingale residuals

Down-and-up indicates incorrect relationship between age and survival:

<<<<<<< HEAD
ggcoxdiagnostics(y.1) + geom_smooth(se = F)
=======
y.1 %>% augment(d) %>% 
  ggplot(aes(x = .fitted, y = .resid)) + geom_point() + geom_smooth()
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916

Attempt 2

Add squared term in age:

<<<<<<< HEAD
y.2 <- coxph(y ~ age + I(age^2), data = d)
tidy(y.2) %>% select(term, estimate, p.value)
=======
y.2 <- coxph(y ~ age + I(age^2), data = d)
tidy(y.2) %>% select(term, estimate, p.value)
>>>>>>> 460c7f7e30cb43650b8fd3b6aad3724488382916

Martingale residuals this time

Not great, but less problematic than before:

<<<<<<< HEAD
ggcoxdiagnostics(y.2) + geom_smooth(se = F)